$X_{ij}$ is the $j^{th}$ claim from the $i^{th}$ risk
$\bar{X_{i}}$ is the average claim from the $i^{th}$ risk
$m(\theta) = E(X_{ij}|\theta)$ is a random variable as $\theta$ is a random variable
$s^2(\theta)=Var(X_{ij}|\theta)$ is also a r.v.
$\sigma^2=E(s^2(\theta))$
$\nu^2=Var(m(\theta))$
$\mu=E(m(\theta))$
Postulate that $m(\theta)=a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n$
and so try and find the values of $a_0, a_1,...,a_n$ which give us the best estimate of $m(\theta)$
We wish to minimise $f=E\left[(m(\theta)-a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n)^2\right]$
Calculate$ \frac{\partial f}{\partial a_0}=E\left[2(m(\theta)-a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n)(-1)\right]$
and set to zero to find the maximum
$E\left[2(m(\theta)-a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n)(-1)\right]=0$
$2 E(m(\theta)) - 2a_0 - 2a_1E(X_1)-...-2a_nE(X_n)=0$
Then as the expectation is taken over the random variable $\theta$ and each of the $X_1,...,X_n$ are taken from the same distribution for given $\theta$ then $E(X_j)=\mu$ and so:
$a_0=\mu\left(1-\displaystyle\sum_{j=1}^{n}a_j \right)$
Result 1: $E(X_jX_k)=\nu^2+\mu^2$
First apply the Tower Law
$E(X_jX_k)=E(E(X_jX_k|\theta))$
Then the standard result $E(XY)$
$E(X_jX_k)=E\left[Cov(X_j, X_k|\theta) + E(X_j|\theta).E(X_k|\theta)\right]$
$E(X_jX_k)=E\left[Cov(X_j, X_k|\theta) + m(\theta).m(\theta)\right]$
$E(X_jX_k)=E\left[Cov(X_j, X_k|\theta)\right] + E(m^2(\theta))$
As $X_j$ and $X_k$ are conditionally independent - the first term is zero
Then the defn of variance, applied to the second term gives
$E(X_jX_k)=Var(m(\theta)) + \left[E(m(\theta))\right]^2$
$E(X_jX_k)=\nu^2+\mu^2$
QED
Result 2: $E(X_k^2)=\sigma^2 + \nu^2+\mu^2$
First apply the Tower Law
$E(X_k^2)=E(E(X_k^2|\theta))$
$E(X_k^2)=E\left[Var(X_k|\theta)+E^2(X_k|\theta)\right]$
$E(X_k^2)=E\left[s^2(\theta)+(m(\theta))^2\right]$
$E(X_k^2)=\sigma^2 + \nu^2+\mu^2$
QED
Result 3: $E(X_k m(\theta))=\nu^2+\mu^2$
First apply the Tower Law
$E(X_k m(\theta))=E(E(X_k m(\theta)|\theta))$
then $m(\theta|\theta)$ is a constant so:
$E(X_k m(\theta))=E(m(\theta).E(X_k|\theta))$
$E(X_k m(\theta))=E(m^2(\theta))$
$E(X_k m(\theta))=\nu^2+\mu^2$
QED
Back to $f=E\left[(m(\theta)-a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n)^2\right]$
Calculate$ \frac{\partial f}{\partial a_k}=-E\left[2(m(\theta)-a_0 + a_1 X_1 + a_2 X_2 + ...+a_nX_n)(X_k)\right]$
And solve $\frac{\partial f}{\partial a_k}=0$
$E(m(\theta)X_k)-E(a_0X_k)-E(a_kX_kX_k)-E\left[\displaystyle\sum_{j=1,j\neq k}^{n}a_j X_j X_k \right]=0$
$\nu^2+\mu^2-a_0\mu-a_k(\sigma^2 + \nu^2+\mu^2)-\displaystyle\sum_{j=1,j\neq k}^{n}a_j (\nu^2+\mu^2) =0$
After much re-arrangement this becomes
$a_k=\frac{\nu^2}{\sigma^2+n \nu^2}$
So we have: $a_0=\mu\left(1-\displaystyle\sum_{j=1}^{n}a_j \right)$ and $a_k=\frac{\nu^2}{\sigma^2+n \nu^2}$
The best estimate is $m(\theta)=a_0 + \displaystyle\sum_{j=1}^{n}a_j X_j$
$m(\theta)=\left(1-\frac{n\nu^2}{\sigma^2+n\nu^2}\right)\mu+\frac{n\nu^2}{\sigma^2+n\nu^2}.\bar{X}$
$m(\theta)=(1-Z)\mu + Z.\bar{X}$ where $Z=\frac{n\nu^2}{\sigma^2+n\nu^2}$